Phone Number Validation And Normalization
This document explains the phone number validation and normalization system used by the application’s contact processing pipeline. It covers the cleaning algorithm that removes separators, handles international formats, enforces length constraints, and integrates with the main contact extraction workflows for CSV, Excel, and text files. It also documents the regex patterns used for phone number detection, fallback mechanisms for edge cases, and how the system behaves during manual number entry and file-based import.
The phone number validation and normalization logic is implemented in the Python backend and invoked from the Electron frontend via Pyodide. The relevant components are organized as follows:
Python backend utilities for validation and parsing
Flask API endpoints for file uploads and manual number parsing
Electron frontend that loads and executes Python logic in the browser using Pyodide
Supporting requirements and documentation
Diagram sources
Section sources
Phone number cleaning and normalization: Implemented in shared functions across multiple modules to ensure consistent behavior for manual input and file-based extraction.
Manual number parsing: Parses user-entered text into structured contacts, extracting names and numbers with flexible delimiters.
File-based contact extraction: Reads CSV, Excel, and text files, detects phone number columns, and normalizes entries.
Flask API: Exposes endpoints for validating individual numbers and parsing manual inputs, plus uploading files for batch processing.
Key responsibilities:
Normalize separators and international prefixes
Enforce minimum/maximum digit requirements
Detect and extract phone numbers from mixed-format text
Provide robust fallbacks for malformed inputs
Section sources
The phone number validation pipeline operates in two primary modes:
Manual entry mode: The Electron frontend sends user-entered text to a Python function executed via Pyodide, which parses and normalizes numbers.
File-based mode: The Electron frontend triggers the Flask API, which reads uploaded files, extracts candidates, and normalizes them.
Diagram sources
Cleaning Algorithm Overview#
The cleaning algorithm performs the following steps:
Strip whitespace and convert to string.
Remove common separators and punctuation.
Retain only digits and the plus sign.
Handle leading zero removal for national numbers.
Add a leading plus for long national numbers that look international.
Count remaining digits and enforce length constraints.
Diagram sources
Section sources
Regex Patterns and Detection Logic#
Separator removal: Removes hyphens, spaces, parentheses, and periods.
Non-digit retention: Keeps only digits and the plus sign.
National number handling: Removes leading zeros for national numbers.
International prefix addition: Adds a leading plus for long national numbers (>10 digits) that do not start with plus.
Length validation: Enforces a strict digit count range of 7 to 15 digits.
These patterns are consistently reused across:
Manual number parsing
File-based extraction
Standalone validation endpoint
Section sources
Manual Number Parsing#
Manual parsing supports:
Multiple separators: newline, comma, semicolon
Mixed name and number formats: “Name: Number”, “Number - Name”
Fallback detection: If neither side clearly matches a phone number, the system attempts to extract a candidate from the raw input using a broader pattern.
Diagram sources
Section sources
File-Based Extraction (CSV, Excel, Text)#
File-based extraction:
Detects likely phone and name columns by heuristics (column names containing “phone”, “number”, “mobile”, “cell”, “tel” for phone; “name”, “contact”, “person” for names).
Falls back to first/second columns if no matches are found.
Applies the same cleaning and validation logic to each candidate.
Diagram sources
Section sources
Validation Rules and Constraints#
Minimum digits: 7
Maximum digits: 15
Accepted characters: digits and a single leading plus sign
International format: Numbers that are 11+ digits without a leading plus are auto-prefixed with a plus sign
National format: Leading zeros are removed for national numbers unless the number is already internationalized
These rules ensure compatibility with typical global phone number lengths while preserving user-friendly input formats.
Section sources
Supported Input Formats and Normalized Outputs#
Examples of supported inputs and their normalized outputs:
“(123) 456-7890” → “+1234567890”
“+1 234 567 8901” → “+12345678901”
“0123456789” → “123456789” (leading zero removed)
“01234567890” → “+1234567890” (auto-prefixed)
“+44 7911 123456” → “+447911123456”
Notes:
Numbers shorter than 7 digits or longer than 15 digits are rejected.
Names are preserved when provided in “Name: Number” or “Number - Name” formats.
Section sources
Integration with the Main Contact Extraction Pipeline#
Manual numbers: Executed in-browser via Pyodide, returning structured contacts immediately.
File uploads: Sent to the Flask API, processed server-side, and returned as contacts.
Consistent normalization: All paths funnel through the same cleaning/validation functions to ensure uniform behavior.
Section sources
The phone number validation depends on:
Python standard libraries: re, json, sys
Pandas and xlrd/openpyxl for Excel/CSV parsing
Flask and CORS for the API layer
Diagram sources
Section sources
Regex-based cleaning is efficient for typical phone number lengths and runs quickly in Python.
Pandas-based CSV/Excel parsing is fast but may be slower for very large files; consider chunking or streaming for extremely large datasets.
The API enforces a maximum upload size to prevent memory issues.
Manual parsing is lightweight and executed in the browser via Pyodide, minimizing server load.
[No sources needed since this section provides general guidance]
Common issues and resolutions:
Validation fails (returns None)
Cause: Number has fewer than 7 or more than 15 digits after cleaning.
Resolution: Ensure the number contains only digits and a leading plus if international.
Leading zero stripped unexpectedly
Cause: National number without a leading plus and length > 10 triggers automatic prefixing.
Resolution: Prefix with a plus sign to indicate international format.
Mixed format not recognized
Cause: Ambiguous separators or short candidate segments.
Resolution: Use “Name: Number” or “Number - Name” formats; ensure at least 7 digits remain after cleaning.
File parsing errors
Cause: Unsupported file type or encoding issues.
Resolution: Confirm file extension (.csv, .txt, .xlsx, .xls) and UTF-8 encoding; verify column headers or structure.
Section sources
The phone number validation and normalization system provides a robust, consistent pipeline across manual entry and file-based import. Its cleaning algorithm, regex-based detection, and strict length constraints ensure reliable processing of diverse input formats while maintaining international compatibility. The integration with the Electron frontend via Pyodide and the Flask API ensures seamless operation in both browser and server contexts.
[No sources needed since this section summarizes without analyzing specific files]
API Endpoints Reference#
POST /parse-manual-numbers
Request body: { numbers: string }
Response: { success: boolean, contacts: [{ number: string, name: string|null }], count: number, message: string }
POST /validate-number
Request body: { number: string }
Response: { valid: boolean, cleaned_number: string|null, original: string }
POST /upload
Form-data: file (txt, csv, xlsx, xls)
Response: { success: boolean, contacts: [{ number: string, name: string|null }], count: number, message: string }
Section sources